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1 Introduction 



This document describes the logical solver introduced in Geneves, 2006 Geneves 



et al, 2007| and provides informal documentation for using its implementation. 

The solver allows automated verification of properties that are expressed as 
logical formulas over trees. A logical formula may for instance express structural 
constraints or navigation properties (like e.g. path existence and node selection) 
in finite trees. 

A decision procedure for a logic traditionally defines a partition of the set of 
logical formulas: formulas which are satisfiable (there is a tree which satisfies the 
formula) and remaining formulas which are unsatisfiable (no tree satisfies the 
given formula). Alternatively (and equivalently) , formulas can be divided into 
valid formulas (formulas which are satisfied by all trees) and invalid formulas 
(formulas that are not satisfied by at least one tree) . The solver is a satisfiability- 
testing solver: it allows checking satisfiability (or unsatisfiability) of a given 
logical formula. Note that validity of a formula if can be checked by testing ->ip 
for unsatisfiability. 

The solver can be used for reasoning over finite ordered trees whatever these 
trees do actually represent. In particular, the logic and the solver are speci fically 



adapted for formulating and solving problems over XML tree structures Bray 



et al, 2004 . The logic can express navigational properties like those expressed 



with the XPath standard language Clark and DeRose, 1999 for navigating and 



selecting sets of nodes from XML trees. Additionally, the logic is expressive 
enough to encode any regular tree language property (it subsumes finite tree 
automata). It can encode constr aints definable wi th common X ML tree type 



definition langua ges (such as DT D |Bray et al., 2004 , XML Schema [Fallside and 



Walmsley, 2004 , and Relax NG Clark and Murata, 2001 ). The logic provides 



high-level constructs specifically designed for reasoning directly with such XML 
concepts: the user can directly write an expression using XPath notation in the 
logic, or even refer to an XML type in the logic. These characteristics make the 
system especially useful for solving problems like those encountered in the static 
analysis of XML code, static verification of XML access control policies, XML 
data security checking, XML query optimization, and the construction of static 
type-checkers, and optimizing compilers for a wide variety of tree-manipulating 
programs and XML processors. 

Outline This user manual is organized as follows: Section [2] describes the 
basics for using the solver without requiring any logical knowledge; Section [3] 
gives some insights on the logic, especially on the simple yet general data tree 



model used by the logic (Section 3.1 1 and on the syntax of logical formulas (Sec- 
tion 3.2 1 including high-level constructs for embedding XPath expressions and 
XML tree types directly in the logic. Finally, Section [4] provides an overview 
of the background theory underlying the logic and its solver, with related refer- 



ences. 



2 Getting Started with XML Applications 

The logical solver is shipped as a compressed file which, once extracted, provides 
binaries along with all required libraries. The "solver. jar" executable file 
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takes a filename as a parametei|j The filename refers to a text file containing 
the logical formula to solve. For example, provided a recenij^] Java runtime 
engine is installed, the following command line: 



java -jar solver. jar formula.txt 



runs the solver on the logical formula contained in "formula.txt". The full 
syntax of logical formulas is given in Section 3.2 The following examples in- 



troduce the logical formulation of some simple yet fundamental XML problems, 
and how the solver output should be interpreted. 



Example 1: emptiness test for an XPath expression. The most basic 
decision problem for a query language is the emptiness test of an expression: 
whether or not a query is self contradictory and always yields an empty result. 
This test is important for error-detection and optimization of host languages 
implementations, i.e. implementations that process languages in which XPath 
expressions are used. For instance, if one can decide at compile time that a query 
result is empty then subsequent bound computations can be ignored. For check- 
ing emptiness of the XPath expression a/b [following-sibling: : c/parent : : d] , 
the contents of the "examplel .txt" file simply consists of the following line: 

. examplel.txt . 

select("a/b[following-sibling: :c/parent: :d] ") 



Running the solver with " examplel.txt" as parameter yields the following 
trace: 

Output for examplel.txt . 

Reading examplel.txt 

Satisfiability Tested Formula: 

(mu X5.(((b & (mu X2.(<-l>(a & (mu X1.(<-1>T I <-2>Xl))) I <-2>X2))) 
& (mu X4. (<2>((mu X3. (<-l>d I <-2>X3)) & c) I <2>X4) ) ) I (<1>X5 I <2>X5) ) ) 

Computing Relevant Closure 
Computed Relevant Closure [1 ms] . 
Computed Lean [1 ms] . 

Lean size is 20. It contains 14 eventualities and 6 symbols. 

Computing Fixpoint [4 ms] . 

Formula is unsatisf iable [14 ms] . 



The input XPath expression is first parsed and compiled into the logic. The 
corresponding logical translation whose satisfiability is going to be tested is 
printed. The solver then computes the Fisher-Ladner closure and the Lean of 
the formula: the set of all basic subformulas that notably defines the search 



space that is going to be explored by the solver (see Geneves et al., 2007 for 
details). The solver attempts to build a satisfying tree in a bottom- up way, 
in the manner of a fixpoint computation that iteratively updates a set of tree 



1 Running the command "java -jar solver, jar" prints the list of required and optional 
arguments. 

2 A Java virtual machine version 1.5.0 (or further) and a Java compiler compliance level 
version 5.0 (or further). 
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nodes. This computation is performed in at most 2°( n ' steps with respect to 
size n of the Lean. 

In this example, no satisfying tree is found: the formula is unsatisfiablc 
(in other terms, no matter on which XML document this XPath expression is 
evaluated, it will always yield an empty result). Intuitively, that is because this 
XPath expression contains a contradiction: according to the query, the same 
node is required to be named both "a" and "d" , which is not allowed for an 
XML tree. 

Empty queries often come from the use of an XPath expression in a con- 
strained setting. The combination of navigational information of the query and 
structural constraints imposed by a DTD (or XML Schema) may rapidly yield 
contradictions. Such contradictions can also be detected by checking a logical 
formula for satisfiability. 



Example 2: checking XPath emptiness in the presence of tree con- 
straints. Suppose we want to check emptiness of the XPath expression 



descendant: : switch [ancestor : : head] /descendant : :seq/ 
descendant: : audio [preceding-sibling: : video] 



over the set of documents defined by the DTD of the SMIL language Hoschka, 
1998| . The following formula is used: 

example2 . txt 



select ( "descendant : : switch [ancestor : : head] /descendant : :seq/ 
descendant: : audio [preceding-sibling: : video] ", 
typeC'sampleDTDs/smil.dtd" , "smil")) 



The first argument for the predicate typeO is a path to the DTD file (here the 
DTD is assumed to be located in a subdirectory called "sampleDTDs" ), and 
the second argument is the name of the element to be considered as top-level 
start symbol. Running the solver with this "example2.txt" file as parameter 
yields the following trace: 

Output for example2.txt 

Reading example2.txt 

Converted tree grammar into BTT [169 ms] . 
Translated BTT into Tree Logic [60 ms] . 

Satisfiability Tested Formula: 

(mu X22. (((audio k (mu X20 . (<-l> ( (seq & (mu X19 . (<-l> (( (switch 
k (mu X17. (<-l>( 
(let_mu 

Xl=(((meta k ~(<1>T)) k ~(<2>T)) I ((meta k ~(<1>T)) k <2>X1)), 
X16=((smil k (~(<1>T) I <1>X15)) k ~(<2>T)) 

in 

X16) I X17) I <-2>X17))) k (mu X18. (<-l>(head I X18) I <-2>X18))) 
I X19) I <-2>X19))) I X20) I <-2>X20))) k 
(mu X21. (<-2>video I <-2>X21))) I (<1>X22 I <2>X22))) 



Computing Relevant Closure 
Computed Relevant Closure [39 ms] . 
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Computed Lean [1 ms] . 

Lean size is 50. It contains 31 eventualities and 19 symbols. 

Computing Fixpoint [37 ms] . 

Formula is satisf iable [99 ms] . 

A satisfying finite binary tree model was found [52 ms] : 
smil(head(switch(seq(video(#, audio), layout), meta) , #) , #) 
In XML syntax: 

<smil xmlns : solver ="http : //warn. inrialpes . f r/xml" solver : context="true"> 
<head> 
<switch> 
<seq> 

<video/> 

<audio solver : target="true"/> 
</ seq> 
<layout/> 
</ switch> 
<meta/> 
</head> 
</smil> 



The referred external DTD (tree grammar) is first parsed, converted into an 
internal representation on binary trees (called "BTT" and that corresponds to 
the mapping described in 3.1 1, and then compiled into the logic. The XPath 
expression is also parsed and compiled into the logic so that the global formula 
can be composed. In that case, the formula is satisfiable (the XPath expression 
is non-empty in the presence of this DTD). The solver outputs a sample tree 
for which the formulas is satisfied. This sample tree is enriched with specific 
attributes: the "solver:target" attribute marks a sample node selected by the 
XPath expression when evaluated from a node marked with "solverxontext" . 



Example 3: checking containment and equivalence between XPath 
expressions. One of the most essential problem for a query language is the 
containment problem: whether or not the result of one query is always included 
into the result of another one. Containment for XPath expressions is for instance 
needed for the static type-checking of XPath queries, for the control-flow analysis 



of XSLT Clark, 1999 , for checking integrity constraints in XML databases, for 
XML data security... 

Suppose for instance that we want to check containment between the following 
XPath expressions: 



descendant: :d[parent: :b] /f ollowing-sibling : :a 



and: 



ancestor-or-self : : */descendant-or-self : :b/a [preceding-sibling: :d] 



Since containment corresponds to logical implication, we actually want to check 
whether the implication of the two corresponding formulas is valid. Since we 
use a satisfiability-testing algorithm, we verify this validity by checking for the 
unsatisfiability of the negated implication, as follows: 
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example3.txt 

~( select ( "descendant : :d[parent: :b] /following-sibling: :a",#) 
=> select("ancestor-or-self : : */descendant-or-self : :b 

/a [preceding-sibling: :d] " ,#)) 



Note that XPath expressions must be compared from the same evaluation con- 
text, which can be any set of nodes, but should be the same set of nodes 
for both expressions. This is denoted by "#". Running the solver with this 
"example3.txt" file results in the following trace: 

. Output for example3.txt 

Reading example3.txt 

Satisfiability Tested Formula: 

(mu X26.(((a & (mu X15.((<-2>T & (~(<-2>T) I <-2>((d & (mu X13.((<-1>T 
k (~(<-l>T) I <-l>(_context I X13))) I (<-2>T & (~(<-2>T) I <-2>X13))))) 
& (mu X14.((<-1>T & (~(<-l>T) I <-l>b)) I (<-2>T & (~(<-2>T) 
I <-2>X14))))))) I «-2>T & (~«-2>T) I <-2>X15))))) & ((~(a) I 
(mu X22. ((~(<-l>T) I <-l>(~(b) I ( (~ (.context) & (~(<1>T) I 
<l>(mu X18. ((" (.context) & (~(<1>T) I <1>X18)) & (~(<2>T) I 
<2>X18))))) & (mu X20. ((~(<-l>T) I <-l> ( (~ (.context) & (~(<1>T) I 
<l>(mu X19. ((~ (.context) & (~(<1>T) I <1>X19)) & (~(<2>T) I 
<2>X19))))) & X20)) & (~«-2>T) I <-2>X20)))))) &("«-2>T) I 
<-2>X22)))) I (mu X25. ((~(<-2>T) I <-2>~(d)) & (~(<-2>T) I 
<-2>X25))))) I (<1>X26 I <2>X26))) 

Computing Relevant Closure 
Computed Relevant Closure [4 ms] . 
Computed Lean [1 ms] . 

Lean size is 29. It contains 23 eventualities and 6 symbols. 

Computing Fixpoint [8 ms] . 

Formula is unsatisf iable [22 ms] . 



The tested formula is unsatisfiablc (in other terms: the implication is valid), 
so one can conclude that the first XPath expression is contained in the second 
XPath expression. 

A related decision problem is the equivalence problem: whether or not two 
queries always return the same result. It is important for reformulation and 
optimization of an expression, which aims at enforcing operational properties 
while preserving semantic equivalence. Equivalence is reducible to containment 
(bi-implication) and is noted <=> in the logic. Note that the previous XPath 
expressions are not equivalent. The reader may check this by using the solver, 
that will generate the following counter-example tree: 



<b xmlns : solver="http: //warn. inrialpes . f r/xml"> 
<d/> 

<a solver :context="true" solver : target="true"/> 
</b> 
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3 Logical Insights 

3.1 Data Model for the Logic 

An XML document is considered as a finite tree of unbounded depth and arity, 
with two kinds of nodes respectively named elements and attributes. In such a 
tree, an element may have any number of children elements, and may carry zero, 
one or more attributes. Attributes are leaves. Elements are ordered whereas 
attributes are not, as illustrated on Figure [l] The logic allows reasoning on such 
trees. Notice that from an XML perspective, data values are ignored. 



<r c=V a="„" b= 

<s d="„"> 
<v/xw/xx e="„ 

</s> 

<t/> 

<u/> 
</r> 

XML Notation 



/> 



Figure 1: Sample XML Tree with Attributes. 



Unranked and Binary Trees There are bijective encodings between un- 
ranked trees (trees of unbounded arity) and binary trees. Owing to these en- 
codings binary trees may be used instead of unranked trees without loss of 
generality. The logic operates on binary trees. The logic relies on the "first- 
child & next-sibling" encoding of unranked trees. In this encoding, the first 
child of a node is preserved in the binary tree representation, whereas siblings 
of this node are appended as right successors in the binary representation. The 
intuition of this encoding is illustrated on Figure [2] for a sample tree. Trees 
can be seen as terms or function calls. More formally, a binary tree t can 








/ 



Figure 2: Binary Encoding Principle. 
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Figure 3: Binary Encoding of Tree of Figure [T] 

be defined by the recursive syntax t ::= <j(t,t') \ e where a is a node la- 
bel and e denotes the empty tree. Similarly unranked trees can be defined 
as t ::— a(h) where ft is a hedge (a sequence of unranked trees) defined as 
h ::= cr(h),h' | e. The function / that translates unranked trees into binary 
trees is then defined by f(a(h),h') — a(f(h),f(h')) and /(e) = e. The re- 
verse mapping used for reconstructing unranked trees from binary trees can be 
expressed as: f~\<j{t y if)) = atf- 1 ^)), f' 1 ^') and /^(e) = e. 

In the remaining part of this manual, the binary representation of a tree 
is implicitly considered, unless stated otherwise. From an XML point of view, 
notice that only the nested structure of XML elements (which are ordered) is 
encoded into binary form like this. XML attributes (which are unordered) are 
left unchanged by this encoding. For instance, Figure|3]presents how the sample 
tree of Figure [T] is mapped. 

3.2 Syntax of Logical Formulas 

Modal Formulas for Navigating in Trees The logic uses two programs for 
navigating in binary trees: the program 1 allows to navigate from a node down 
to its first successor and the program 2 for navigating from a node down to its 
second successor. The logic also features converse programs -1 and -2 for navi- 
gating upward in binary trees, respectively from the first and second successors 
to the parent node. Some basic logical formulas together with corresponding 
satisfying binary trees are shown on Table [l] When using XPath expressions, 
like e.g. select ( "a [b] "), the XPath expression is automatically compiled into 
a logical formula over the binary tree representation (see Section |3.2[ ) . 

The set of logical formulas is defined by the syntax given on Figure [4] where 
the meta-syntax (X)® means one or more occurences of X separated by commas. 
Models of a formula are finite binary trees for which the formula is satisfied at 



2006 



some node. The semantic s of logical formulas is formally defined in Geneves, 
Geneves et al., 2007 . Table [l] gives basic formulas that use modalities for 



navigating in binary trees and node names. 

Recursive Formulas The logic allows expressing recursion in trees through 
the use of a fixpoint operator. For example the recursive formula: 
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Sample Formula 


Satisfying Binary Tree 


XML syntax 




a 

/ 




a & <l>b 


b 


<a><b/x/a> 




a 

\ 




a & <2>b 


b 


<a/><b/> 




a 






/ 

b 

\ 




a & <l>(b & <2>c) 


c 


<a><b/><c/x/a> 




d 

/ \ 




e & <-l>(d & <2>g) 


e g 


<d><e/x/dXg/> 


f & <-2>(g & ~<2>T) 


none 


none 



Table 1: Sample Formulas using Modalities. 





formula 


T 


true 


F 


false 


/ 


element name 


P 


atomic proposition 


# 


start context 


p | ip 


disjunction 


ip & if 


conjunction 


ip => p> 


implication 


p <=> ip 


equivalence 


(p) 


parenthesized formula 


V 


negation 


<p>ip 


existential modality 


</>T 


attribute named I 


$X 


variable 


let ($X = p)& 


in ip binder for recursion 


predicate 


predicate (See Figure [5| 




program inside modalities 


1 


first child 


2 


next sibling 


-1 


parent 


-2 


previous sibling 



Figure 4: Syntax of Logical Formulas. 
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let $X = b I <2>$X in $X 

means that either the current node is named b or there is a sibling of the current 
node which is named b. For this purpose, the variable $X is bound to the 
subformula b I <2>$X which contains an occurence of $X (therefore defining 
the recursion). The scope of this binding is the subformula that follows the 
"in" symbol of the formula, that is $X. The entire formula can thus be seen as 
a compact recursive notation for a infinitely nested formula of the form: 

b I <2>(b I <2>(b I <2>(. . .))) 

Recursion allows expressing global properties. For instance, the recursive for- 
mula: 

~ let $X = a I <1>$X I <2>$X in $X 

expresses the absence of nodes named a in the whole subtree of the current node 
(including the current node). Furthermore, the fixpoint operator makes possible 
to bind several variables at a time, which is specifically useful for expressing 
mutual recursion. For example, the mutually recursive formula: 

let $X = (a & <2>$Y) I <1>$X I <2>$X, $Y = b I <2>$Y in $X 

asserts that there is a node somewhere in the subtree such that this node is 
named a and it has at least one sibling which is named b. Binding several 
variables at a time provides a very expressive yet succinct notation for expressing 
mutually recursive structural patterns (that may occur in DTDs for instance). 

The combination of modalities and recursion makes the logic one of the most 
expressive (yet decidablc) logic known. For instance, regular tree grammars 
can be expressed with the logic using recursion and (forward) modalities. The 
combination of converse programs and recursion allows expressing properties 
about ancestors of a node for instance. The possibility of nesting recursive 
formulas allow XPath expressions to be translated into the logic. 

Cycle-Freeness Restriction There is a restriction on the use of recursive 
formulas. Only formulas that are cycle-free are allowed. Intuitively a formula 
is cycle-free if it does not contain both a program and its converse inside the 
same recursion. For instance, the formula 

let $X = a I <-l>$X I <1>$X in $X 

is not cycle-free since 1 and -1 occur in front of the same variable bound by 
the same binder. A formula is cycle-free if one cannot find both a program and 
its converse by starting from a variable and going up in the formula tree to the 
binder of this variable. For instance, the following formula is cycle-free: 

let $X = a k (let $X = b I <1>$X in $X) I <-l>$X in $X 

since variable binders are properly nested and a program and its converse never 
appear in front of the same variable bound by the same binder. 

Translations of XPath expressions and XML tree types into the logic always 
generate cycle-free formulas, whatever the translated XPath or XML type is. 
The cycle-freeness restriction only matters when one directly writes recursive 
logical formulas. From a theoretical perspective the cycle-freeness restriction 
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predicate ::= 



select(" query") 
select(" query" , ip) 
exists(" query") 
exists(" query" , ip) 

type(7",0 

type ("/V, 

f orward_incompatible(<^, y/) 
backward_incompatible(93, (p') 

element(<p) 
attribute^) 
descendant(</?) 
exclude^) 
added_element(<p, ip') 
added_attribute(( ( 9, ip') 

non_empty(" query" , <y9) 
new_element_name("gueri/", "/", "/",/) 
new_region(" guer?/", "/", "/'", 
new_content("gMer?/", "/", 

predicate-name ( (</?) ® ) 



Figure 5: Syntax of Predicates for XML Reasoning. 



spec 



def 



def;ip 



formula (see Fig. [4j 



predicate-name ((I)®) = tp' custom definition 
def; def list of definitions 



Figure 6: Global Syntax for Specifying Problems. 



comes from the fact that converse programs may interact with recursion in a 
subtle manner such that the finite model property is lost, so the cycle-freeness 
restriction ensures that the negation of every formula can also be expressed in 
the logic, or in other terms, that the logic is closed under negation and all other 
boolean ope rations (a detailed discussion on this topic can be found in Geneves 
et al, 2007]). 



Supported XPath Expressions The logic provides high-level constructions 
for facilitating the formulation of problems involving XPath expressions. The 
construct selectf'e" , ip) where e is an XPath expression provides a way of 
embedding XPath expression directly into the logic (e is automatically compiled 
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query ::= 



path 



qualifier 



nt 



/path 
path 

query | query 
query D query 

path/ path 

path[qualifier] 

a:\nt 

qualifier and qualifier 
qualifier or qualifier 
not {qualifier) 
path 

path/@nt 
@nt 

a 

* 

self | child | parent 

descendant | ancestor 

descendant-or-self 

ancestor-or-self 

following-sibling 

preceding-sibling 

following | preceding 



absolute path 
relative path 
union 

intersection 

path composition 
qualified path 
step 

conjunction 
disjunction 
negation 
path 

attribute path 
attribute step 

node test 
node label 
any node label 

tree navigation axis 



Figure 7: XPath Expressions. 



into a logical formula, see Geneves et al, 2007 for details on the compilation 



technique) . The second parameter ip denotes the context from which the XPath 

is applied; it can be any formula. The other construct select("e") is simply a 

shorthand for select("e" , #), where # is the initial context node mark. The 

syntax of supported XPath expressions is given on Figure [7] We observed 

that, in practice, many XPath expressions contain syntactic sugars that can 

also fit into this fragment. Figure [8] presents how our XPath parser rewrites 

some commonly found XPath patterns into the fragment of Figure [7J where the 

notation (a::nt) k stands for the composition of k successive path steps of the 

same form: a:\nt /.../ a:\nt. 

v v- ' 

k steps 

Supported XML Types The logic is expressive enough to allow for the en- 
coding of any regular tree grammar. The logical construction type( " filename " , start) 
provides a convenient way of referring to tree grammars written in usual nota- 
tions like DTD, XML Schema, or Relax NG. The referred tree type is automat- 
ically parsed and compiled into the logic, starting from the given start symbol 
(which can be the root symbol or any other symbol defined by the tree type). 
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ni[position() = 1] 
ni[position() = lastQ] 
ni[position() = k ] 

k>l 

count (path) = 
count (path) > 
count (nt) > v __^_^ 

k>0 



~* nf[not(preceding-sibling::ni)] 
~^ ni[not(following-sibling::n£)] 
^[(preceding-sibling::^)' 0-1 ] 

~» not(path) 

7ii/ (following-sibling: :nt) k 



preceding-sibling::*[position() = last() and qualifier] 

preceding-sibling::*[not(preceding-sibling::*) and qualifier] 



Figure 8: Syntactic Sugars and their Rewritings. 



3.3 Predicates 

We build on the aforementioned query and schema compilers, and define ad- 
ditional predicates that facilitate the formulation of decision problems at a 
higher level of abstraction. Specifically, these predicates are introduced as logi- 
cal macros with the goal of allowing system usage while focusing (only) on the 
XML-side properties, and keeping underlying logical issues transparent for the 
user. Ultimately, we regard the set of basic logical formulas (such as modal- 
ities and recursive binders) as an assembly language, to which predicates are 
translated. Some built-in predicates include: 



• the predicate exclude(<p) which is satisfiablc iff there is no node that 
satisfies ip in the whole tree. This predicate can be used for excluding 
specific element names or even nodes selected by a given XPath expression. 

• the predicate element (T) builds the disjunction of all element names oc- 
curing in T. 

• the predicate descendant^) forces the existence of a node satisfying tp 
in the subtree, and predicate-name((p)®) is a call to a custom predicate, 
as explained in the next section. 



3.4 Custom Predicates 

Following the spirit of predicates presented in the previous section, users may 
also define their own custom predicates. The full syntax of XML logical specifi- 
cations to be used with the system is defined on Figure [6] where the meta-syntax 
(X)® means one or more occurrence of X separated by commas. A global prob- 
lem specification can be any formula (as defined on Figure [4j, or a list of custom 
predicate definitions separated by semicolons and followed by a formula. A cus- 
tom predicate may have parameters that are instantiated with actual formulas 
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when the custom predicate is called (as shown on Figure |5| . A formula bound 
to a custom predicate may include calls to other predicates, but not to the 
currently defined predicate (recursive definitions must be made through the let 
binder shown on Figure E| . 



4 Overview of the Background Theory 



The l ogic and its solver are formally described in |Geneves, 2006| |Geneves et al, 



2007 . The logic is a modal logic of trees, more specifically an alternation-free 
//-calculus with converse for finite trees. The logic is equipped with forward 
and backward modalities, which are notably useful for capturing all XPath 
(including reverse) axes. The logic is also equipped with a fixed-point oper- 
ator for expressing recursion in finite trees. A n-ary fixed-point operator is 
also provided so that mutual recursion occurring in XML types can be suc- 
cintly expressed in the logic. The logic is also able to express any proposi- 
tional property, for instance about nodes labels (XML elemen t and attribute 



names). Last but not least, the logic is closed under negation Geneves, 2006 



Geneves et al., 2007 , that is, the negation of any logical formula can be ex- 
pressed in the logic too (this property is essential for checking XPath contain- 
ment which corresponds to logical implication). All these features together: 
propositions, forward and backward modalities, recursion (fixed-points oper- 
ators), and boolean connectives yield a logic of very high expressive power. 
Actually, this logic is one of the most expressive yet decidable known logic. It 
can express properties of regular tree languages. Specifically, it is as expres- 
sive as tree automata (which notably provide the foundation for the Relax NG 
language in the XML world) and monadic second- order logic of finite trees (of- 



ten referred a s WS2S or "MSO" in the literature) |Thatcher and Wright, 1968| 



Doner, 1970 . However, the logical solver is considerably (orders of magni- 
tude) f aster than solvers for m onadic second-order logic, like e.g., the MONA 
solver |Klarlund et al., 2001 (the MONA solver nevertheless remains useful 



when one wants to write logical formulas using MSO syntax). Technically, the 
truth status of a logical formula (satisfiable or unsatisfiable) is automatically 
determined in exponential time, and more specifically in time 2°( n ) where n is 



proportional to (and smaller than) the size of the logical formula Geneves, 2006 



Geneves et al., 2007 . In comparison, the complexity of monadic second-order 
logic is much higher: it was proved in the late 1960s that the best decision pro- 
cedure for m onadic second order logic is at least hyper- exponential in the size of 



the formula Thatcher and Wright, 1968 Doner, 1970 that is, not bounded by 
any stack of exponentials. The tree logic described in this document currently 
offers the best balance known between expressivity and complexity for decid- 
ability. The acute reader may notice that the complexity of the logic is optimal 
since it subsumes tree automata and less expressive logics such as CTL |Clarke 



and Emerson, 1981 , for instance. 

XPath expressions and regular tree types can be linearly translated into the 
logic. This observation allows to generalize the complexity of the algorithm for 
solving the logic to a wide range of problems in the XML world. 

The decision procedure for the logic is based on an inverse tableau method 
that searche s for a satisfying tree. The algorithm has been proved sound and 



complete in |Geneves, 2006 1 Geneves et al., 2007 . The solver is implemented 
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Geneves, Layaida, & Quint 



using symbolic techniques like binary decision diagrams (BDDs) Bryant, 1986 



It also uses numerous optimization techniques such as on-the-fly formula nor- 
malization and simplification, conjunctive partitioning, early quantification. 

Finally, another benefit of this method (illustrated in Section [2]) is that the 
solver can be used to generate an example (or counter-example) XML tree for 
a given property, which allows for instance to reproduce a program's bug in the 
developer environment, independently from the logical solver. 
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